3 research outputs found
Implementasi Algoritma Penentuan Parameter Densitas Pada Metode Dbscan Untuk Pengelompokan Data
DBSCAN adalah salah satu metode klastering dengan
konsep kerapatan data. Ketika data memiliki densitas beragam
maka hasil pengklasteran DBSCAN tidak maksimal. Hal ini
disebabkan nilai parameter densitas bersifat global untuk
seluruh data. Implementasi tugas akhir ini menyelesaikan
permasalahan tersebut menggunakan modifikasi DBSCAN
sehingga nilai parameter densitas akan berbeda untuk setiap
klaster. Nilai parameter densitas didapatkan dari hasil knearest
neighbor beberapa data agar data terambil bukanlah
noise atau outlier.
Uji coba dilakukan dengan membandingkan hasil
metode DBSCAN dengan DBSCAN yang telah dimodifikasi.
Indikator keberhasilan uji coba menggunakan uji validitas
klaster Indeks Dunn. Hasil uji coba validitas indeks
menunjukkan bahwa DBSCAN modifikasi memiliki hasil klaster
yang kurang baik dibandingkan hasil DBSCAN dengan nilai
rata-rata Indeks Dunn 0.12 dan 0.146. Uji coba juga dilakukan
dengan melihat label data dari kelas yang dihasilkan dengan
kelas groundtruth. Pada uji coba ini DBSCAN modifikasi dapat
mengidentifikasi hasil klaster yang lebih mirip dengan data
aslinya dibanding dengan hasil DBSCAN tanpa modifikasi.
===============================================================================================
DBSCAN is a clustering algorithm based on density
concept. DBSCAN clustering results could not be optimal if
data have a variation of densities level because density
parameter values applied for the entire data clusters. Our
implementation resolved the problems using a modified
DBSCAN so that the density parameter values will be different
for each cluster. Density parameter values are obtained from
the k-nearest neighbor implementation in some data to
recognize data outliers.
Our experiments were comparing clustering results of
DBSCAN and modified DBSCAN algorithms. We used Dunn
Index as cluster validity measures. The results showed that
Dunn Index values of modified DBSCAN were not better
compared to the results of standard DBSCAN with Dunn Index
of 0.12 and 0.146 respectively. However our experiments also
compared data label of clustering results with label in groundtruth
dataset. Labelling experiments showed that clustering
results of modified DBSCAN algorithms had more similar label
with ground-truth dataset
Ekstraksi Fitur Conflict of Interest pada Artikel Ilmiah Untuk Menentukan Kualitas Citation Author
Sitasi pada publikasi ilmiah mempengaruhi kualitas artikel sehingga akanberpengaruh terhadap kredibilitas author (peneliti). Terda pat banyak cara untuk
meningkatkan kredibilitas peneliti, salah satunya adalah dengan melakukan sitasi terhadap diri sendiri (self citation). Namun, proses self citation yang berlebihan
mengurangi kualitas sitasi paper tersebut. Terdapat banyak penelitian yang membuat metode untuk mengukur kualitas self-citation yang tidak sesuai, salah satunya dengan menggunakan rasio self-citation pada jendela waktu. Akan tetapi, metode ini tidak mempertimbangkan kesesuaian topik penelitian paper utama terhadap paper yang mensitasinya. Sehingga diperlukan adanya penentuan kualitas sitasi pada author agar dapat diketahui apakah peneliti sering meggunakan citation yang tidak sesuai topiknya berdasarkan paper author dan paper sitasi.
Penelitian ini mengusulkan metode ekstraksi fitur conflict of interest untuk menentukan kualitas citation penulis artikel ilmiah. Hal ini dilakukan untuk mengetahui seberapa baik peneliti dalam menggunakan sitasinya. Terdapat 2 fitur yang diusulkan
dalam penelitian ini. Pertama, fitur confict of interest yang didapatkan dari konflik kepentingan antara author paper dan author paper yang disitasi. Kedua, fitur similaritas konten yaitu fitur yang didapatkan dari kesamaan topik antar dokumen paper dan yang
disitasinya. Metode similaritas yang digunakan adalah salah satu pendekatan deep learning yaitu Siamese Neural Network yang dikombinasikan dengan Long Short Term
Memory. Kedua fitur ini selanjutnya diklasifikasi untuk menentukan kualitas citation author. Seluruh fitur akan diuji performanya pada proses klasifikasi. Hasil klasifikasi selanjutnya akan dihitung nilai akurasinya untuk mendapatkan performa fitur yang diusulkan. Hasil uji coba menunjukkan bahwa usulan fitur dapat digunakan untuk mengklasifikasi kualitas sitasi author. Hal ini ditunjukkan dengan nilai akurasi sebesar
66.67% pada klasifikasi Random Forest dan rata-rata akurasi sebesar 62% pada 3 klasifikasi yang digunakan.
===================================================================================================
Citation on scientific paper affect on article quality so that it will affect on author credibility. There are many ways to increase the credibility of researchers, one of them is to do a self-citation. However, this process makes the calculation in bibliometric becoming less accurate because it doesn’t consider citation quality. There is some studies that proposed a method to measure an inappropriate self-citation, one of them is using
self-citation ratio. But, this method doesnt consider topic relatedness between main paper
and cited paper. So, its required to determine author’s citation quality to know that author are using anomalous citation based on main paper and each cited paper. This research proposed feature extraction conflict of interest to detect author’s citation quality. It allows us to know how right an author use citation in publication. Two features are proposed in this research. First, conflict of interest feature, is obtained from interest
conflict between paper author and citation’s paper author. Second, content similarity feature, is obtained from the similarity between paper and cited papers of author. Deep learning approach is used to get the similarity of each document. Combination of Siamese
neural network and Long Short-Term Memory can provide a better result on similarity based on training data. Last, all features will be combined with self-citation’s count
feature based on previous research and classified to detect author’s citation quality. Features will be tested for its performance using classification. From the classification results, accuracy will be calculated to obtain the performance of the proposed feature.
Based on the result, proposed feature can be used to classify author’s citation quality. It is shown with 66,67% of accuracy by using Random Forest classification and 62% of average accuracy on 3 classifier
Siamese Long Short-Term Memory for Detecting Conflict of Interest on Scientific Papers
Scientific articles cited by other researchers have an impact on increasing author credibility. However, the citation process may be misused to unnaturally raise a bibliometric indicator value such as researcher’s h-index. Researchers may overly cites their own works, referred as self-citation, even though the topic of the references are not related to the current article. Further misconduct is excessive citations on the works of peoples related to the researcher which can be coercive or not, referred as conflict of interest (CoI). The proposed method uses a deep learning approach, Siamese Long ShortTerm Memory (LSTM), to recognize subject similarities between a scientific article and its references. Standard text similarity fails to do so because contextual relatedness of sentences in the articles need some learning process. Siamese-LSTM learns contextual relatedness of sentences in the article using two identical LSTM. Steps of the proposed method are (i) word-embedding to get weight values of terms but still considers their semantic relations, (ii) k-means clustering to generate training data for reducing time complexity in Siamese-LSTM learning of scientific articles, (iii) learns Siamese-LSTM weight from training data to identify contextual relatedness of sentences, (iv) calculate similarity of a scientific article with its references based on Siamese-LSTM. The empirical experiments are used to analyze similarity values and the possibility for conflict of interest in an article